Multi-agent Reinforcement Learning in Sequential Social Dilemmas
نویسندگان
چکیده
Matrix games like Prisoner’s Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Qnetwork, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.
منابع مشابه
Inequity aversion resolves intertemporal social dilemmas
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. Howe...
متن کاملTowards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent Reinforcement Learning Approach
The Iterated Prisoner’s Dilemma has guided research on social dilemmas for decades. However, it distinguishes between only two atomic actions: cooperate and defect. In real-world prisoner’s dilemmas, these choices are temporally extended and different strategies may correspond to sequences of actions, reflecting grades of cooperation. We introduce a Sequential Prisoner’s Dilemma (SPD) game to b...
متن کاملEmotional Multiagent Reinforcement Learning in Social Dilemmas
Social dilemmas have attracted extensive interest in multiagent system research in order to study the emergence of cooperative behaviors among selfish agents. Without extra mechanisms or assumptions, directly applying multiagent reinforcement learning in social dilemmas will end up with convergence to the Nash equilibrium of mutual defection among the agents. This paper investigates the importa...
متن کاملStrategic Foresighted Learning in Competitive Multi-Agent Games
We describe a generalized Q-learning type algorithm for reinforcement learning in competitive multi-agent games. We make the observation that in a competitive setting with adaptive agents an agent’s actions will (likely) result in changes in the opponents policies. In addition to accounting for the estimated policies of the opponents, our algorithm also adjusts these future opponent policies by...
متن کاملConsequentialist Conditional Cooperation in Social Dilemmas with Imperfect Information
Social dilemmas, where mutual cooperation can lead to high payoffs but participants face incentives to cheat, are ubiquitous in multi-agent interaction. We wish to construct agents that cooperate with pure cooperators, avoid exploitation by pure defectors, and incentivize cooperation from the rest. However, often the actions taken by a partner are (partially) unobserved or the consequences of i...
متن کامل